AmpNet - A Highly Available Cluster Interconnection Network
نویسندگان
چکیده
One of the most important challenges facing computing clusters in the foreseeable future is providing fault tolerant, high availability cluster hardware for non-stop applications. This capability is in addition to high throughput and low latency. This paper presents the Advanced MultiProcessor Network (AmpNet), a gigabit speed cluster interconnect that was designed with these issues in mind. The AmpNet Network Interface Card (NIC) uses network-shared memory as network cache to provide a fault-tolerant, self-healing network with no data loss. Higher-level network centric services use networkshared memory to ensure high availability and continuity of service in applications. In addition, the programmable NIC, with low-latency messaging protocols and field upgradeable soft logic, provides a foundation for researchers who would like to develop additional cluster services and protocols for network centric computing. This paper describes the fault tolerant design and implementation of the AmpNet hardware architecture.
منابع مشابه
Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks
Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...
متن کاملPerformance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks
Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...
متن کاملComputing in the RAIN: a reliable array of independent nodes - Parallel and Distributed Systems, IEEE Transactions on
ÐThe RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes...
متن کاملMemory Hierarchy Considerations for Cost-eeective Cluster Computing
Using oo-the-shelf commodity workstations and PCs to build a cluster for parallel computing has become a common practice. The cost-eeectiveness of a cluster computing platform for a given budget and for certain types of applications is mainly determined by its memory hierarchy and the interconnection network conngurations of the cluster. Finding such a cost-eeective solution from exhaustive sim...
متن کاملPerformance Analysis of Interconnection Networks for Multi-cluster Systems
With the current popularity of cluster computing systems, it is increasingly important to understand the capabilities and potential performance of various interconnection networks. In this paper, we propose an analytical model for studying the capabilities and potential performance of interconnection networks for multi-cluster systems. The model takes into account stochastic quantities as well ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003